iProLINK: an integrated protein resource for literature mining

نویسندگان

  • Zhang-Zhi Hu
  • Inderjeet Mani
  • Vincent Hermoso
  • Hongfang Liu
  • Cathy H. Wu
چکیده

The exponential growth of large-scale molecular sequence data and of the PubMed scientific literature has prompted active research in biological literature mining and information extraction to facilitate genome/proteome annotation and improve the quality of biological databases. Motivated by the promise of text mining methodologies, but at the same time, the lack of adequate curated data for training and benchmarking, the Protein Information Resource (PIR) has developed a resource for protein literature mining--iProLINK (integrated Protein Literature INformation and Knowledge). As PIR focuses its effort on the curation of the UniProt protein sequence database, the goal of iProLINK is to provide curated data sources that can be utilized for text mining research in the areas of bibliography mapping, annotation extraction, protein named entity recognition, and protein ontology development. The data sources for bibliography mapping and annotation extraction include mapped citations (PubMed ID to protein entry and feature line mapping) and annotation-tagged literature corpora. The latter includes several hundred abstracts and full-text articles tagged with experimentally validated post-translational modifications (PTMs) annotated in the PIR protein sequence database. The data sources for entity recognition and ontology development include a protein name dictionary, word token dictionaries, protein name-tagged literature corpora along with tagging guidelines, as well as a protein ontology based on PIRSF protein family names. iProLINK is freely accessible at http://pir.georgetown.edu/iprolink, with hypertext links for all downloadable files.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Nexus among Resource Based Theory, Marketing Strategy, and Firm Performance: An Integrated Framework

The purpose of this article is to present the link among resource based theory, marketing strategy, and firms’ performance in order to propose integrative framework showing how the three constructs are linked. It is organized based on a review of academic literature on resource based theory and marketing strategy chronicled in major marketing journals up to December 2015. Besides, the paper ref...

متن کامل

MutD - A PubMed Scale Resource for Protein Mutation-Disease Relations through Bio-Medical Literature Mining

Text mining approaches can accelerate the process of assembling knowledge from literature. In this abstract, we present our effort in assembling a resource for protein mutationdisease relations assembled from literature. Keywords—literature mining; protein mutation-disease

متن کامل

pGenN, a Gene Normalization Tool for Plant Genes and Proteins in Scientific Literature

BACKGROUND Automatically detecting gene/protein names in the literature and connecting them to databases records, also known as gene normalization, provides a means to structure the information buried in free-text literature. Gene normalization is critical for improving the coverage of annotation in the databases, and is an essential component of many text mining systems and database curation p...

متن کامل

The Protein Information Resource: an integrated public resource of functional annotation of proteins

The Protein Information Resource (PIR) serves as an integrated public resource of functional annotation of protein data to support genomic/proteomic research and scientific discovery. The PIR, in collaboration with the Munich Information Center for Protein Sequences (MIPS) and the Japan International Protein Information Database (JIPID), produces the PIR-International Protein Sequence Database ...

متن کامل

An online literature mining tool for protein phosphorylation

A web-based version of the RLIMS-P literature mining system was developed for online mining of protein phosphorylation information from MEDLINE abstracts. The online tool presents extracted phosphorylation objects (phosphorylated proteins, phosphorylation sites and protein kinases) in summary tables and full reports with evidence-tagged abstracts. The tool further allows mapping of phosphorylat...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Computational biology and chemistry

دوره 28 5-6  شماره 

صفحات  -

تاریخ انتشار 2004